Convolutional Neural Networks

Project: Write an Algorithm for a Dog Identification App


In this notebook, some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this project. You will not need to modify the included code beyond what is requested. Sections that begin with '(IMPLEMENTATION)' in the header indicate that the following block of code will require additional functionality which you must provide. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!

Note: Once you have completed all of the code implementations, you need to finalize your work by exporting the Jupyter Notebook as an HTML document. Before exporting the notebook to html, all of the code cells need to have been run so that reviewers can see the final implementation and output. You can then export the notebook by using the menu above and navigating to File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question X' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.

The rubric contains optional "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. If you decide to pursue the "Stand Out Suggestions", you should include the code in this Jupyter notebook.


Why We're Here

In this notebook, you will make the first steps towards developing an algorithm that could be used as part of a mobile or web app. At the end of this project, your code will accept any user-supplied image as input. If a dog is detected in the image, it will provide an estimate of the dog's breed. If a human is detected, it will provide an estimate of the dog breed that is most resembling. The image below displays potential sample output of your finished project (... but we expect that each student's algorithm will behave differently!).

Sample Dog Output

In this real-world setting, you will need to piece together a series of models to perform different tasks; for instance, the algorithm that detects humans in an image will be different from the CNN that infers dog breed. There are many points of possible failure, and no perfect algorithm exists. Your imperfect solution will nonetheless create a fun user experience!

The Road Ahead

We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.

  • Step 0: Import Datasets
  • Step 1: Detect Humans
  • Step 2: Detect Dogs
  • Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
  • Step 4: Create a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 5: Write your Algorithm
  • Step 6: Test Your Algorithm

Step 0: Import Datasets

Make sure that you've downloaded the required human and dog datasets:

Note: if you are using the Udacity workspace, you DO NOT need to re-download these - they can be found in the /data folder as noted in the cell below.

  • Download the dog dataset. Unzip the folder and place it in this project's home directory, at the location /dog_images.

  • Download the human dataset. Unzip the folder and place it in the home directory, at location /lfw.

Note: If you are using a Windows machine, you are encouraged to use 7zip to extract the folder.

In the code cell below, we save the file paths for both the human (LFW) dataset and dog dataset in the numpy arrays human_files and dog_files.

In [1]:
import numpy as np
from glob import glob

# load filenames for human and dog images
human_files = np.array(glob("/data/lfw/*/*"))
dog_files = np.array(glob("/data/dog_images/*/*/*"))

# print number of images in each dataset
print('There are %d total human images.' % len(human_files))
print('There are %d total dog images.' % len(dog_files))
There are 13233 total human images.
There are 8351 total dog images.

Step 1: Detect Humans

In this section, we use OpenCV's implementation of Haar feature-based cascade classifiers to detect human faces in images.

OpenCV provides many pre-trained face detectors, stored as XML files on github. We have downloaded one of these detectors and stored it in the haarcascades directory. In the next code cell, we demonstrate how to use this detector to find human faces in a sample image.

In [2]:
import cv2                
import matplotlib.pyplot as plt                        
%matplotlib inline                               

# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')

# load color (BGR) image
img = cv2.imread(human_files[20])
# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# find faces in image
faces = face_cascade.detectMultiScale(gray)

# print number of faces detected in the image
print('Number of faces detected:', len(faces))

# get bounding box for each detected face
for (x,y,w,h) in faces:
    # add bounding box to color image
    cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
    
# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# display the image, along with bounding box
plt.imshow(cv_rgb)
plt.show()
Number of faces detected: 1

Before using any of the face detectors, it is standard procedure to convert the images to grayscale. The detectMultiScale function executes the classifier stored in face_cascade and takes the grayscale image as a parameter.

In the above code, faces is a numpy array of detected faces, where each row corresponds to a detected face. Each detected face is a 1D array with four entries that specifies the bounding box of the detected face. The first two entries in the array (extracted in the above code as x and y) specify the horizontal and vertical positions of the top left corner of the bounding box. The last two entries in the array (extracted here as w and h) specify the width and height of the box.

Write a Human Face Detector

We can use this procedure to write a function that returns True if a human face is detected in an image and False otherwise. This function, aptly named face_detector, takes a string-valued file path to an image as input and appears in the code block below.

In [3]:
# returns "True" if face is detected in image stored at img_path
def face_detector(img_path):
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    return len(faces) > 0

(IMPLEMENTATION) Assess the Human Face Detector

Question 1: Use the code cell below to test the performance of the face_detector function.

  • What percentage of the first 100 images in human_files have a detected human face?
  • What percentage of the first 100 images in dog_files have a detected human face?

Ideally, we would like 100% of human images with a detected face and 0% of dog images with a detected face. You will see that our algorithm falls short of this goal, but still gives acceptable performance. We extract the file paths for the first 100 images from each of the datasets and store them in the numpy arrays human_files_short and dog_files_short.

Answer: (You can print out your results and/or write your percentages in this cell)

In [4]:
from tqdm import tqdm

human_files_short = human_files[:100]
dog_files_short = dog_files[:100]

#-#-# Do NOT modify the code above this line. #-#-#


## TODO: Test the performance of the face_detector algorithm 
## on the images in human_files_short and dog_files_short.

def face_detection_test(files):
    detection_cnt = 0;
    total_cnt = len(files)
    for file in files:
        detection_cnt += face_detector(file)
    return detection_cnt, total_cnt

We suggest the face detector from OpenCV as a potential way to detect human images in your algorithm, but you are free to explore other approaches, especially approaches that make use of deep learning :). Please use the code cell below to design and test your own face detection algorithm. If you decide to pursue this optional task, report performance on human_files_short and dog_files_short.

In [5]:
### (Optional) 
### TODO: Test performance of anotherface detection algorithm.
### Feel free to use as many code cells as needed.
print("detect face in human_files: {} / {}".format(face_detection_test(human_files_short)[0], face_detection_test(human_files_short)[1]))
print("detect face in dog_files: {} / {}".format(face_detection_test(dog_files_short)[0], face_detection_test(dog_files_short)[1]))
detect face in human_files: 98 / 100
detect face in dog_files: 17 / 100

Step 2: Detect Dogs

In this section, we use a pre-trained model to detect dogs in images.

Obtain Pre-trained VGG-16 Model

The code cell below downloads the VGG-16 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories.

In [6]:
import torch
import torchvision.models as models

# define VGG16 model
VGG16 = models.vgg16(pretrained=True)

# check if CUDA is available
use_cuda = torch.cuda.is_available()

# move model to GPU if CUDA is available
if use_cuda:
    VGG16 = VGG16.cuda()

Given an image, this pre-trained VGG-16 model returns a prediction (derived from the 1000 possible categories in ImageNet) for the object that is contained in the image.

(IMPLEMENTATION) Making Predictions with a Pre-trained Model

In the next code cell, you will write a function that accepts a path to an image (such as 'dogImages/train/001.Affenpinscher/Affenpinscher_00001.jpg') as input and returns the index corresponding to the ImageNet class that is predicted by the pre-trained VGG-16 model. The output should always be an integer between 0 and 999, inclusive.

Before writing the function, make sure that you take the time to learn how to appropriately pre-process tensors for pre-trained models in the PyTorch documentation.

In [7]:
from PIL import Image
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
%matplotlib inline

def VGG16_predict(img_path):
    '''
    Use pre-trained VGG-16 model to obtain index corresponding to 
    predicted ImageNet class for image at specified path
    
    Args:
        img_path: path to an image
        
    Returns:
        Index corresponding to VGG-16 model's prediction
    '''
    
    ## TODO: Complete the function.
    ## Load and pre-process an image from the given img_path
    ## Return the *index* of the predicted class for that image
    
    # Import image from img_path in PIL format
    img = Image.open(img_path)

    # Define transformations of image
    preprocess = transforms.Compose([transforms.Resize(256),
                                     transforms.CenterCrop(224),
                                     transforms.ToTensor(),
                                     transforms.Normalize(mean=(0.485, 0.456, 0.406),
                                     std=(0.229, 0.224, 0.225))])

    # Preprocess image to 4D Tensor (.unsqueeze(0) adds a dimension)
    img_tensor = preprocess(img).unsqueeze_(0)

    # Move tensor to GPU if available
    if use_cuda:
        img_tensor = img_tensor.cuda()      

    # Get predicted category for image
    with torch.no_grad():
        output = VGG16(img_tensor)
        prediction = torch.argmax(output).item()
    
    return prediction # predicted class index

(IMPLEMENTATION) Write a Dog Detector

While looking at the dictionary, you will notice that the categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151-268, inclusive, to include all categories from 'Chihuahua' to 'Mexican hairless'. Thus, in order to check to see if an image is predicted to contain a dog by the pre-trained VGG-16 model, we need only check if the pre-trained model predicts an index between 151 and 268 (inclusive).

Use these ideas to complete the dog_detector function below, which returns True if a dog is detected in an image (and False if not).

In [8]:
# predict dog using ImageNet class
VGG16_predict(dog_files_short[42])
Out[8]:
243
In [9]:
### returns "True" if a dog is detected in the image stored at img_path
def dog_detector(img_path):
    ## TODO: Complete the function.
    idx = VGG16_predict(img_path)
    # return idx >= 151 and idx <= 268 # true/false
    return 151 <= idx <= 268 # true/false
In [10]:
import matplotlib.pyplot as plt
%matplotlib inline

def show_image_being_checked(img_path):
    img = Image.open(img_path)
    fig = plt.figure(figsize = (5,5)) 
    ax = fig.add_subplot(111)
    ax.imshow(img)
In [11]:
index_to_be_checked = 35

print("Is there a dog: ", dog_detector(dog_files_short[index_to_be_checked]))
print("Is there a dog: ", dog_detector(human_files_short[index_to_be_checked]))
show_image_being_checked(dog_files_short[index_to_be_checked])
show_image_being_checked(human_files_short[index_to_be_checked])
Is there a dog:  True
Is there a dog:  False

(IMPLEMENTATION) Assess the Dog Detector

Question 2: Use the code cell below to test the performance of your dog_detector function.

  • What percentage of the images in human_files_short have a detected dog?
  • What percentage of the images in dog_files_short have a detected dog?

Answer:

In [12]:
### TODO: Test the performance of the dog_detector function
### on the images in human_files_short and dog_files_short.
def dog_detector_test(files):
    detection_cnt = 0;
    total_cnt = len(files)
    for file in files:
        detection_cnt += dog_detector(file)
    return detection_cnt, total_cnt

print("detect a dog in human_files: {} / {}".format(dog_detector_test(human_files_short)[0], dog_detector_test(human_files_short)[1]))
print("detect a dog in dog_files: {} / {}".format(dog_detector_test(dog_files_short)[0], dog_detector_test(dog_files_short)[1]))
detect a dog in human_files: 1 / 100
detect a dog in dog_files: 100 / 100

We suggest VGG-16 as a potential network to detect dog images in your algorithm, but you are free to explore other pre-trained networks (such as Inception-v3, ResNet-50, etc). Please use the code cell below to test other pre-trained PyTorch models. If you decide to pursue this optional task, report performance on human_files_short and dog_files_short.


Step 3: Create a CNN to Classify Dog Breeds (from Scratch)

Now that we have functions for detecting humans and dogs in images, we need a way to predict breed from images. In this step, you will create a CNN that classifies dog breeds. You must create your CNN from scratch (so, you can't use transfer learning yet!), and you must attain a test accuracy of at least 10%. In Step 4 of this notebook, you will have the opportunity to use transfer learning to create a CNN that attains greatly improved accuracy.

We mention that the task of assigning breed to dogs from images is considered exceptionally challenging. To see why, consider that even a human would have trouble distinguishing between a Brittany and a Welsh Springer Spaniel.

Brittany Welsh Springer Spaniel

It is not difficult to find other dog breed pairs with minimal inter-class variation (for instance, Curly-Coated Retrievers and American Water Spaniels).

Curly-Coated Retriever American Water Spaniel

Likewise, recall that labradors come in yellow, chocolate, and black. Your vision-based algorithm will have to conquer this high intra-class variation to determine how to classify all of these different shades as the same breed.

Yellow Labrador Chocolate Labrador Black Labrador

We also mention that random chance presents an exceptionally low bar: setting aside the fact that the classes are slightly imabalanced, a random guess will provide a correct answer roughly 1 in 133 times, which corresponds to an accuracy of less than 1%.

Remember that the practice is far ahead of the theory in deep learning. Experiment with many different architectures, and trust your intuition. And, of course, have fun!

(IMPLEMENTATION) Specify Data Loaders for the Dog Dataset

Use the code cell below to write three separate data loaders for the training, validation, and test datasets of dog images (located at dog_images/train, dog_images/valid, and dog_images/test, respectively). You may find this documentation on custom datasets to be a useful resource. If you are interested in augmenting your training and/or validation data, check out the wide variety of transforms!

In [13]:
import os
from torchvision import datasets
import torchvision.transforms as transforms
import torch
import numpy as np
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

### TODO: Write data loaders for training, validation, and test sets
## Specify appropriate transforms, and batch_sizes

batch_size = 20
num_workers = 0

data_dir = 'dog_images/'

train_transforms = transforms.Compose([transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.RandomRotation(10),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

valid_transforms = transforms.Compose([transforms.Resize(size=(224,224)),
                                           transforms.ToTensor(),
                                           transforms.Normalize([0.5, 0.5, 0.5],
                                                                [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(size=(224,224)),
                                           transforms.ToTensor(),
                                           transforms.Normalize([0.5, 0.5, 0.5],
                                                                [0.229, 0.224, 0.225])])

train_dataset = datasets.ImageFolder(os.path.join(data_dir, 'train'), transform=train_transforms)
valid_dataset = datasets.ImageFolder(os.path.join(data_dir, 'valid'), transform=valid_transforms)
test_dataset = datasets.ImageFolder(os.path.join(data_dir, 'test'), transform=test_transforms)

trainLoader = torch.utils.data.DataLoader(train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True,
                                          num_workers=num_workers)

validLoader = torch.utils.data.DataLoader(valid_dataset,
                                          batch_size=batch_size,
                                          shuffle=True,
                                          num_workers=num_workers)

testLoader = torch.utils.data.DataLoader(test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False,
                                          num_workers=num_workers)

loaders_scratch = {
    'train': trainLoader,
    'valid': validLoader,
    'test': testLoader
}

Question 3: Describe your chosen procedure for preprocessing the data.

  • How does your code resize the images (by cropping, stretching, etc)? What size did you pick for the input tensor, and why?
  • Did you decide to augment the dataset? If so, how (through translations, flips, rotations, etc)? If not, why not?

Answer: I have applied RandomResizedCrop, RandomHorizontalFlip and RandomRotation by 10 degrees to just train_data. This will do both data (image) augmentations (to extend a dataset and improve generalization when training the model) and resizing jobs. Image augmentation will give randomness to the dataset so, it prevents overfitting and should lead to better model performance when working on test_data.

For Validation data, implemented Resizing of (256) and then, center crop to make 224 X 224. Since valid_data will be used for validation check, image augmentation is not needed.

For the test_data, only image resizing is used.

I read, that a dataset of around 10,000 images should be more than enough to achieve an accuracy of greater than 10%. The data set almost provides enough samples, but still I augmentation to reduce over-fitting and make the model more general.

(IMPLEMENTATION) Model Architecture

Create a CNN to classify dog breed. Use the template in the code cell below.

In [14]:
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

num_classes = 133 # total classes of dog breeds

import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# define the CNN architecture
class Net(nn.Module):
    ### TODO: choose an architecture, and complete the class
    def __init__(self):
        super(Net, self).__init__()
        ## Define layers of a CNN
        self.conv1 = nn.Conv2d(3, 32, 2, stride=2, padding=0)
        self.conv2 = nn.Conv2d(32, 64, 2, stride=2, padding=0)
        self.conv3 = nn.Conv2d(64, 128, 2, padding=1)

        # pool
        self.pool = nn.MaxPool2d(2, 2)
        
        # fully-connected
        self.fc1 = nn.Linear(128*7*7, 500)
        self.fc2 = nn.Linear(500, num_classes) 
        
        # drop-out
        self.dropout = nn.Dropout(0.3)
    
    def forward(self, x):
        ## Define forward behavior
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = F.relu(self.conv3(x))
        x = self.pool(x)
        
        # flatten
        x = x.view(-1, 7*7*128)
        
        x = self.dropout(x)
        x = F.relu(self.fc1(x))
        
        x = self.dropout(x)
        x = self.fc2(x)
        return x

#-#-# You so NOT have to modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()
print(model_scratch)

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()
Net(
  (conv1): Conv2d(3, 32, kernel_size=(2, 2), stride=(2, 2))
  (conv2): Conv2d(32, 64, kernel_size=(2, 2), stride=(2, 2))
  (conv3): Conv2d(64, 128, kernel_size=(2, 2), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=6272, out_features=500, bias=True)
  (fc2): Linear(in_features=500, out_features=133, bias=True)
  (dropout): Dropout(p=0.3)
)

Question 4: Outline the steps you took to get to your final CNN architecture and your reasoning at each step.

Answer:

1) Convolutional Layer # 1, with filter_size =2, stride_size=2 and no padding, receives the input of an image as a 224x224x3 tensor. This produces a 112x112x32 tensor which has 32 filters. This goes through a pooling layer, which halves it to a 56x56x32 tensor.

2) Convolutional Layer # 2, with filter_size =2, stride_size=2 and padding = 0, receives the 56x56x32 tensor and produces a 28x28x64 tensor with 64 filters. Again, this goes through a pooling layer, which halves it to a 14x14x64 tensor.

3) Convolutional Layer # 3, with filter_size =2 and padding = 1, receives the 14x14x64 tensor and produces a 14x14x128 tensor with 128 filters. Again, this goes through a pooling layer, which halves it to a 7x7x128 tensor.

4) The tensor is flattened to be fit to be a input to the 12877 sized first fully connected layer.

5) The tensor is then passed through a dropout layer to increase the generalization ability of the architecture.

6) The tensor passed through the fully connected (linear) layer with a 12877 input tensor and a 500 output tensor.

7) The tensor is again passed through a dropout layer to increase the generalization ability of the architecture.

8) Finally, the tensor passed through the fully connected layer with a 500 input tensor and a 133 output tensor(selected to match number of outputs to the number of classes (133 classes of dog breeds)).

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_scratch, and the optimizer as optimizer_scratch below.

In [15]:
import torch.optim as optim

### TODO: select loss function
criterion_scratch = nn.CrossEntropyLoss()

### TODO: select optimizer
optimizer_scratch = optim.SGD(model_scratch.parameters(), lr = 0.05)

(IMPLEMENTATION) Train and Validate the Model

Train and validate your model in the code cell below. Save the final model parameters at filepath 'model_scratch.pt'.

In [16]:
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path, last_validation_loss=None):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    if last_validation_loss is not None:
        valid_loss_min = last_validation_loss
    else:
        valid_loss_min = np.Inf
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## find the loss and update the model parameters accordingly
            ## record the average training loss, using something like
            ## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))

            # initialize weights to zero
            optimizer.zero_grad()
            
            output = model(data)
            
            # calculate loss
            loss = criterion(output, target)
            
            # back prop
            loss.backward()
            
            # grad
            optimizer.step()
            
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            
            if batch_idx % 100 == 0:
                print('Epoch %d, Batch %d loss: %.6f' %
                  (epoch, batch_idx + 1, train_loss))
            
        ######################    
        # validate the model #
        ######################
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## update the average validation loss
            output = model(data)
            loss = criterion(output, target)
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))

            
        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
        
        ## TODO: save the model if validation loss has decreased
        if valid_loss < valid_loss_min:
            torch.save(model.state_dict(), save_path)
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
            valid_loss_min,
            valid_loss))
            valid_loss_min = valid_loss
            
    # return trained model
    return model

(IMPLEMENTATION) Test the Model

Try out your model on the test dataset of dog images. Use the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 10%.

In [17]:
# train the model
model_scratch = train(25, loaders_scratch, model_scratch, optimizer_scratch, 
                      criterion_scratch, use_cuda, 'model_scratch.pt')
Epoch 1, Batch 1 loss: 4.885856
Epoch 1, Batch 101 loss: 4.886631
Epoch 1, Batch 201 loss: 4.883122
Epoch 1, Batch 301 loss: 4.879823
Epoch: 1 	Training Loss: 4.876696 	Validation Loss: 4.852567
Validation loss decreased (inf --> 4.852567).  Saving model ...
Epoch 2, Batch 1 loss: 4.900702
Epoch 2, Batch 101 loss: 4.831778
Epoch 2, Batch 201 loss: 4.826902
Epoch 2, Batch 301 loss: 4.812554
Epoch: 2 	Training Loss: 4.807890 	Validation Loss: 4.734018
Validation loss decreased (4.852567 --> 4.734018).  Saving model ...
Epoch 3, Batch 1 loss: 4.700637
Epoch 3, Batch 101 loss: 4.706649
Epoch 3, Batch 201 loss: 4.684307
Epoch 3, Batch 301 loss: 4.663776
Epoch: 3 	Training Loss: 4.660213 	Validation Loss: 4.585538
Validation loss decreased (4.734018 --> 4.585538).  Saving model ...
Epoch 4, Batch 1 loss: 4.416874
Epoch 4, Batch 101 loss: 4.588737
Epoch 4, Batch 201 loss: 4.582261
Epoch 4, Batch 301 loss: 4.575166
Epoch: 4 	Training Loss: 4.573720 	Validation Loss: 4.491154
Validation loss decreased (4.585538 --> 4.491154).  Saving model ...
Epoch 5, Batch 1 loss: 4.308277
Epoch 5, Batch 101 loss: 4.495371
Epoch 5, Batch 201 loss: 4.507907
Epoch 5, Batch 301 loss: 4.510949
Epoch: 5 	Training Loss: 4.504839 	Validation Loss: 4.494208
Epoch 6, Batch 1 loss: 4.411390
Epoch 6, Batch 101 loss: 4.442616
Epoch 6, Batch 201 loss: 4.455127
Epoch 6, Batch 301 loss: 4.452613
Epoch: 6 	Training Loss: 4.444368 	Validation Loss: 4.488962
Validation loss decreased (4.491154 --> 4.488962).  Saving model ...
Epoch 7, Batch 1 loss: 4.556818
Epoch 7, Batch 101 loss: 4.429084
Epoch 7, Batch 201 loss: 4.424972
Epoch 7, Batch 301 loss: 4.406979
Epoch: 7 	Training Loss: 4.400149 	Validation Loss: 4.677376
Epoch 8, Batch 1 loss: 4.302762
Epoch 8, Batch 101 loss: 4.340216
Epoch 8, Batch 201 loss: 4.336204
Epoch 8, Batch 301 loss: 4.340586
Epoch: 8 	Training Loss: 4.338662 	Validation Loss: 4.267602
Validation loss decreased (4.488962 --> 4.267602).  Saving model ...
Epoch 9, Batch 1 loss: 4.536035
Epoch 9, Batch 101 loss: 4.269617
Epoch 9, Batch 201 loss: 4.290830
Epoch 9, Batch 301 loss: 4.290415
Epoch: 9 	Training Loss: 4.288323 	Validation Loss: 4.353278
Epoch 10, Batch 1 loss: 4.155346
Epoch 10, Batch 101 loss: 4.218344
Epoch 10, Batch 201 loss: 4.233316
Epoch 10, Batch 301 loss: 4.230550
Epoch: 10 	Training Loss: 4.227028 	Validation Loss: 4.206638
Validation loss decreased (4.267602 --> 4.206638).  Saving model ...
Epoch 11, Batch 1 loss: 4.013055
Epoch 11, Batch 101 loss: 4.155626
Epoch 11, Batch 201 loss: 4.182014
Epoch 11, Batch 301 loss: 4.185185
Epoch: 11 	Training Loss: 4.182541 	Validation Loss: 4.147748
Validation loss decreased (4.206638 --> 4.147748).  Saving model ...
Epoch 12, Batch 1 loss: 3.978252
Epoch 12, Batch 101 loss: 4.126458
Epoch 12, Batch 201 loss: 4.141610
Epoch 12, Batch 301 loss: 4.140405
Epoch: 12 	Training Loss: 4.148808 	Validation Loss: 4.195004
Epoch 13, Batch 1 loss: 4.550745
Epoch 13, Batch 101 loss: 4.079791
Epoch 13, Batch 201 loss: 4.081433
Epoch 13, Batch 301 loss: 4.092875
Epoch: 13 	Training Loss: 4.094350 	Validation Loss: 4.068678
Validation loss decreased (4.147748 --> 4.068678).  Saving model ...
Epoch 14, Batch 1 loss: 4.208064
Epoch 14, Batch 101 loss: 4.045722
Epoch 14, Batch 201 loss: 4.038850
Epoch 14, Batch 301 loss: 4.034258
Epoch: 14 	Training Loss: 4.041825 	Validation Loss: 4.246012
Epoch 15, Batch 1 loss: 3.977729
Epoch 15, Batch 101 loss: 3.996008
Epoch 15, Batch 201 loss: 4.007264
Epoch 15, Batch 301 loss: 3.999873
Epoch: 15 	Training Loss: 4.000205 	Validation Loss: 4.186264
Epoch 16, Batch 1 loss: 3.936841
Epoch 16, Batch 101 loss: 3.983598
Epoch 16, Batch 201 loss: 3.990140
Epoch 16, Batch 301 loss: 3.997200
Epoch: 16 	Training Loss: 4.003581 	Validation Loss: 4.054395
Validation loss decreased (4.068678 --> 4.054395).  Saving model ...
Epoch 17, Batch 1 loss: 3.900873
Epoch 17, Batch 101 loss: 3.915105
Epoch 17, Batch 201 loss: 3.925333
Epoch 17, Batch 301 loss: 3.939068
Epoch: 17 	Training Loss: 3.931944 	Validation Loss: 4.247539
Epoch 18, Batch 1 loss: 3.552700
Epoch 18, Batch 101 loss: 3.880611
Epoch 18, Batch 201 loss: 3.876248
Epoch 18, Batch 301 loss: 3.889548
Epoch: 18 	Training Loss: 3.895605 	Validation Loss: 3.935856
Validation loss decreased (4.054395 --> 3.935856).  Saving model ...
Epoch 19, Batch 1 loss: 3.704153
Epoch 19, Batch 101 loss: 3.841036
Epoch 19, Batch 201 loss: 3.844674
Epoch 19, Batch 301 loss: 3.859274
Epoch: 19 	Training Loss: 3.856133 	Validation Loss: 4.115299
Epoch 20, Batch 1 loss: 3.782399
Epoch 20, Batch 101 loss: 3.788813
Epoch 20, Batch 201 loss: 3.820086
Epoch 20, Batch 301 loss: 3.845252
Epoch: 20 	Training Loss: 3.848016 	Validation Loss: 4.004457
Epoch 21, Batch 1 loss: 3.400081
Epoch 21, Batch 101 loss: 3.730472
Epoch 21, Batch 201 loss: 3.771528
Epoch 21, Batch 301 loss: 3.784602
Epoch: 21 	Training Loss: 3.784505 	Validation Loss: 3.946776
Epoch 22, Batch 1 loss: 3.819806
Epoch 22, Batch 101 loss: 3.713667
Epoch 22, Batch 201 loss: 3.749726
Epoch 22, Batch 301 loss: 3.773268
Epoch: 22 	Training Loss: 3.767153 	Validation Loss: 4.427404
Epoch 23, Batch 1 loss: 3.816243
Epoch 23, Batch 101 loss: 3.796219
Epoch 23, Batch 201 loss: 3.789028
Epoch 23, Batch 301 loss: 3.751629
Epoch: 23 	Training Loss: 3.747857 	Validation Loss: 3.969118
Epoch 24, Batch 1 loss: 3.696667
Epoch 24, Batch 101 loss: 3.671636
Epoch 24, Batch 201 loss: 3.673515
Epoch 24, Batch 301 loss: 3.700973
Epoch: 24 	Training Loss: 3.705796 	Validation Loss: 4.132953
Epoch 25, Batch 1 loss: 3.344269
Epoch 25, Batch 101 loss: 3.610570
Epoch 25, Batch 201 loss: 3.643476
Epoch 25, Batch 301 loss: 3.668215
Epoch: 25 	Training Loss: 3.668607 	Validation Loss: 3.807688
Validation loss decreased (3.935856 --> 3.807688).  Saving model ...
In [18]:
# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('model_scratch.pt'))
In [19]:
def test(loaders, model, criterion, use_cuda):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))

# call test function    
test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)
Test Loss: 3.777946


Test Accuracy: 13% (109/836)

Step 4: Create a CNN to Classify Dog Breeds (using Transfer Learning)

You will now use transfer learning to create a CNN that can identify dog breed from images. Your CNN must attain at least 60% accuracy on the test set.

(IMPLEMENTATION) Specify Data Loaders for the Dog Dataset

Use the code cell below to write three separate data loaders for the training, validation, and test datasets of dog images (located at dogImages/train, dogImages/valid, and dogImages/test, respectively).

If you like, you are welcome to use the same data loaders from the previous step, when you created a CNN from scratch.

In [20]:
## TODO: Specify data loaders
loaders_transfer = loaders_scratch.copy()

(IMPLEMENTATION) Model Architecture

Use transfer learning to create a CNN to classify dog breed. Use the code cell below, and save your initialized model as the variable model_transfer.

In [21]:
import torchvision.models as models
import torch.nn as nn

## TODO: Specify model architecture 
model_transfer = models.resnet50(pretrained=True)

for param in model_transfer.parameters():
    param.requires_grad = False

model_transfer.fc = nn.Linear(2048, 133, bias=True)

fc_parameters = model_transfer.fc.parameters()

for param in fc_parameters:
    param.requires_grad = True

model_transfer

if use_cuda:
    model_transfer = model_transfer.cuda()
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.torch/models/resnet50-19c8e357.pth
100%|██████████| 102502400/102502400 [00:01<00:00, 90456700.91it/s]

Question 5: Outline the steps you took to get to your final CNN architecture and your reasoning at each step. Describe why you think the architecture is suitable for the current problem.

Answer: ResNet has high outstanding performance when it comes to Image classification,and that is why ResNet was chosen for Transfer Learning. A new classifying layer was added, while keeping rest of the architecture same as of the original model. The parameters of the convolutional layers are kept the same, and only the weight of the new layers were gradually updated. The final test accuracy was around 70% which is over and above the expected (60%) and denotes good performance. I think this is because of ResNet's feature of "Identity shortcut connection", which skips one or more layers and thus prevents overfitting when training.

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_transfer, and the optimizer as optimizer_transfer below.

In [22]:
criterion_transfer = nn.CrossEntropyLoss()
optimizer_transfer = optim.SGD(model_transfer.fc.parameters(), lr=0.001)

(IMPLEMENTATION) Train and Validate the Model

Train and validate your model in the code cell below. Save the final model parameters at filepath 'model_transfer.pt'.

In [23]:
# train the model
# train(n_epochs, loaders_transfer, model_transfer, optimizer_transfer, criterion_transfer, use_cuda, 'model_transfer.pt')

def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()

            # initialize weights to zero
            optimizer.zero_grad()
            
            output = model(data)
            
            # calculate loss
            loss = criterion(output, target)
            
            # back prop
            loss.backward()
            
            # grad
            optimizer.step()
            
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            
            if batch_idx % 100 == 0:
                print('Epoch %d, Batch %d loss: %.6f' %
                  (epoch, batch_idx + 1, train_loss))
        
        ######################    
        # validate the model #
        ######################
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## update the average validation loss
            output = model(data)
            loss = criterion(output, target)
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))

            
        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
        
        ## TODO: save the model if validation loss has decreased
        if valid_loss < valid_loss_min:
            torch.save(model.state_dict(), save_path)
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
            valid_loss_min,
            valid_loss))
            valid_loss_min = valid_loss
            
    # return trained model
    return model

train(20, loaders_transfer, model_transfer, optimizer_transfer, criterion_transfer, use_cuda, 'model_transfer.pt')
Epoch 1, Batch 1 loss: 5.044183
Epoch 1, Batch 101 loss: 4.894135
Epoch 1, Batch 201 loss: 4.864468
Epoch 1, Batch 301 loss: 4.829885
Epoch: 1 	Training Loss: 4.820352 	Validation Loss: 4.649954
Validation loss decreased (inf --> 4.649954).  Saving model ...
Epoch 2, Batch 1 loss: 4.711864
Epoch 2, Batch 101 loss: 4.675890
Epoch 2, Batch 201 loss: 4.651938
Epoch 2, Batch 301 loss: 4.630519
Epoch: 2 	Training Loss: 4.622760 	Validation Loss: 4.425070
Validation loss decreased (4.649954 --> 4.425070).  Saving model ...
Epoch 3, Batch 1 loss: 4.383416
Epoch 3, Batch 101 loss: 4.503475
Epoch 3, Batch 201 loss: 4.479043
Epoch 3, Batch 301 loss: 4.457779
Epoch: 3 	Training Loss: 4.449836 	Validation Loss: 4.225709
Validation loss decreased (4.425070 --> 4.225709).  Saving model ...
Epoch 4, Batch 1 loss: 4.253612
Epoch 4, Batch 101 loss: 4.333853
Epoch 4, Batch 201 loss: 4.314296
Epoch 4, Batch 301 loss: 4.297244
Epoch: 4 	Training Loss: 4.289755 	Validation Loss: 4.039047
Validation loss decreased (4.225709 --> 4.039047).  Saving model ...
Epoch 5, Batch 1 loss: 4.333025
Epoch 5, Batch 101 loss: 4.162959
Epoch 5, Batch 201 loss: 4.156671
Epoch 5, Batch 301 loss: 4.134840
Epoch: 5 	Training Loss: 4.129529 	Validation Loss: 3.830328
Validation loss decreased (4.039047 --> 3.830328).  Saving model ...
Epoch 6, Batch 1 loss: 4.046906
Epoch 6, Batch 101 loss: 4.030524
Epoch 6, Batch 201 loss: 4.002583
Epoch 6, Batch 301 loss: 3.977555
Epoch: 6 	Training Loss: 3.973897 	Validation Loss: 3.652886
Validation loss decreased (3.830328 --> 3.652886).  Saving model ...
Epoch 7, Batch 1 loss: 3.662252
Epoch 7, Batch 101 loss: 3.867653
Epoch 7, Batch 201 loss: 3.859673
Epoch 7, Batch 301 loss: 3.845044
Epoch: 7 	Training Loss: 3.836735 	Validation Loss: 3.495594
Validation loss decreased (3.652886 --> 3.495594).  Saving model ...
Epoch 8, Batch 1 loss: 3.872625
Epoch 8, Batch 101 loss: 3.742964
Epoch 8, Batch 201 loss: 3.729483
Epoch 8, Batch 301 loss: 3.705002
Epoch: 8 	Training Loss: 3.701895 	Validation Loss: 3.320663
Validation loss decreased (3.495594 --> 3.320663).  Saving model ...
Epoch 9, Batch 1 loss: 3.854538
Epoch 9, Batch 101 loss: 3.629705
Epoch 9, Batch 201 loss: 3.590029
Epoch 9, Batch 301 loss: 3.573706
Epoch: 9 	Training Loss: 3.565134 	Validation Loss: 3.146697
Validation loss decreased (3.320663 --> 3.146697).  Saving model ...
Epoch 10, Batch 1 loss: 3.417363
Epoch 10, Batch 101 loss: 3.501801
Epoch 10, Batch 201 loss: 3.479370
Epoch 10, Batch 301 loss: 3.444540
Epoch: 10 	Training Loss: 3.441517 	Validation Loss: 3.027229
Validation loss decreased (3.146697 --> 3.027229).  Saving model ...
Epoch 11, Batch 1 loss: 3.308566
Epoch 11, Batch 101 loss: 3.371197
Epoch 11, Batch 201 loss: 3.350598
Epoch 11, Batch 301 loss: 3.332960
Epoch: 11 	Training Loss: 3.330420 	Validation Loss: 2.889969
Validation loss decreased (3.027229 --> 2.889969).  Saving model ...
Epoch 12, Batch 1 loss: 3.321592
Epoch 12, Batch 101 loss: 3.254152
Epoch 12, Batch 201 loss: 3.241922
Epoch 12, Batch 301 loss: 3.225713
Epoch: 12 	Training Loss: 3.218555 	Validation Loss: 2.785198
Validation loss decreased (2.889969 --> 2.785198).  Saving model ...
Epoch 13, Batch 1 loss: 3.062560
Epoch 13, Batch 101 loss: 3.144298
Epoch 13, Batch 201 loss: 3.132062
Epoch 13, Batch 301 loss: 3.114970
Epoch: 13 	Training Loss: 3.112477 	Validation Loss: 2.616147
Validation loss decreased (2.785198 --> 2.616147).  Saving model ...
Epoch 14, Batch 1 loss: 3.030813
Epoch 14, Batch 101 loss: 3.018771
Epoch 14, Batch 201 loss: 3.016935
Epoch 14, Batch 301 loss: 3.029758
Epoch: 14 	Training Loss: 3.028542 	Validation Loss: 2.527945
Validation loss decreased (2.616147 --> 2.527945).  Saving model ...
Epoch 15, Batch 1 loss: 3.048309
Epoch 15, Batch 101 loss: 2.961159
Epoch 15, Batch 201 loss: 2.959768
Epoch 15, Batch 301 loss: 2.940480
Epoch: 15 	Training Loss: 2.938333 	Validation Loss: 2.417693
Validation loss decreased (2.527945 --> 2.417693).  Saving model ...
Epoch 16, Batch 1 loss: 2.983096
Epoch 16, Batch 101 loss: 2.863965
Epoch 16, Batch 201 loss: 2.873852
Epoch 16, Batch 301 loss: 2.854904
Epoch: 16 	Training Loss: 2.856272 	Validation Loss: 2.331551
Validation loss decreased (2.417693 --> 2.331551).  Saving model ...
Epoch 17, Batch 1 loss: 2.660032
Epoch 17, Batch 101 loss: 2.791194
Epoch 17, Batch 201 loss: 2.775873
Epoch 17, Batch 301 loss: 2.775273
Epoch: 17 	Training Loss: 2.772359 	Validation Loss: 2.232952
Validation loss decreased (2.331551 --> 2.232952).  Saving model ...
Epoch 18, Batch 1 loss: 2.814512
Epoch 18, Batch 101 loss: 2.723233
Epoch 18, Batch 201 loss: 2.701512
Epoch 18, Batch 301 loss: 2.696891
Epoch: 18 	Training Loss: 2.698925 	Validation Loss: 2.158649
Validation loss decreased (2.232952 --> 2.158649).  Saving model ...
Epoch 19, Batch 1 loss: 2.843835
Epoch 19, Batch 101 loss: 2.629277
Epoch 19, Batch 201 loss: 2.624386
Epoch 19, Batch 301 loss: 2.617520
Epoch: 19 	Training Loss: 2.614392 	Validation Loss: 2.092497
Validation loss decreased (2.158649 --> 2.092497).  Saving model ...
Epoch 20, Batch 1 loss: 2.662182
Epoch 20, Batch 101 loss: 2.576125
Epoch 20, Batch 201 loss: 2.550067
Epoch 20, Batch 301 loss: 2.538024
Epoch: 20 	Training Loss: 2.532785 	Validation Loss: 2.014350
Validation loss decreased (2.092497 --> 2.014350).  Saving model ...
Out[23]:
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (layer2): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (3): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (layer3): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (3): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (4): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (5): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (layer4): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0)
  (fc): Linear(in_features=2048, out_features=133, bias=True)
)
In [ ]:
# load the model that got the best validation accuracy
model_transfer.load_state_dict(torch.load('model_transfer.pt'))

(IMPLEMENTATION) Test the Model

Try out your model on the test dataset of dog images. Use the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 60%.

In [24]:
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
Test Loss: 2.001006


Test Accuracy: 69% (584/836)

(IMPLEMENTATION) Predict Dog Breed with the Model

Write a function that takes an image path as input and returns the dog breed (Affenpinscher, Afghan hound, etc) that is predicted by your model.

In [25]:
### TODO: Write a function that takes a path to an image as input
### and returns the dog breed that is predicted by the model.

# list of class names by index, i.e. a name can be accessed like class_names[0]
#class_names = [item[4:].replace("_", " ") for item in data_transfer['train'].classes]
class_names = [item[4:].replace("_", " ") for item in train_dataset.classes]

model_transfer.load_state_dict(torch.load('model_transfer.pt'))

def predict_breed_transfer(img_path):
    # load the image and return the predicted breed
    img = Image.open(img_path)


    # Define transformations of image
    preprocess = transforms.Compose([transforms.Resize(256),
                                     transforms.CenterCrop(224),
                                     transforms.ToTensor(),
                                     transforms.Normalize(mean=(0.485, 0.456, 0.406),
                                     std=(0.229, 0.224, 0.225))])

    # Preprocess image to 4D Tensor (.unsqueeze(0) adds a dimension)
    img_tensor = preprocess(img).unsqueeze_(0)

    # Move tensor to GPU if available
    if use_cuda:
        img_tensor = img_tensor.cuda()
        
    ## Inference
    # Turn on evaluation mode
    model_transfer.eval()
    
    # Get predicted category for image
    with torch.no_grad():
        output = model_transfer(img_tensor)
        prediction = torch.argmax(output).item()
        
    # Turn off evaluation mode
    model_transfer.train()
    
    # Use prediction to get dog breed
    breed = class_names[prediction]
    
    return breed

Step 5: Write your Algorithm

Write an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

  • if a dog is detected in the image, return the predicted breed.
  • if a human is detected in the image, return the resembling dog breed.
  • if neither is detected in the image, provide output that indicates an error.

You are welcome to write your own functions for detecting humans and dogs in images, but feel free to use the face_detector and human_detector functions developed above. You are required to use your CNN from Step 4 to predict dog breed.

Some sample output for our algorithm is provided below, but feel free to design your own user experience!

Sample Human Output

(IMPLEMENTATION) Write your Algorithm

In [26]:
### TODO: Write your algorithm.
### Feel free to use as many code cells as needed.

def run_app(img_path):
    ## handle cases for a human face, dog, and neither
    if face_detector(img_path):
        print('Hello Human!')
        plt.imshow(Image.open(img_path))
        plt.show()
        print(f'You look like a ... {predict_breed_transfer(img_path)}')
        print('\n-----------------------------------\n')
    elif dog_detector(img_path):
        plt.imshow(Image.open(img_path))
        plt.show()
        print(f'This is a picture of a ... {predict_breed_transfer(img_path)}')
        print('\n-----------------------------------\n')
    else:
        plt.imshow(Image.open(img_path))
        plt.show()
        print('Sorry, I did not detect a human or a dog in this image.')
        print('\n-----------------------------------\n')

Step 6: Test Your Algorithm

In this section, you will take your new algorithm for a spin! What kind of dog does the algorithm think that you look like? If you have a dog, does it predict your dog's breed accurately? If you have a cat, does it mistakenly think that your cat is a dog?

(IMPLEMENTATION) Test Your Algorithm on Sample Images!

Test your algorithm at least six images on your computer. Feel free to use any images you like. Use at least two human and two dog images.

Question 6: Is the output better than you expected :) ? Or worse :( ? Provide at least three possible points of improvement for your algorithm.

Answer: (Three possible points for improvement) The putput is better than I expected. I used some random images from the internet, and the model has done a decent job. Still, I believe the points of improvement include: 1) More training data for dog images will improve the training model, and result in higher accuracy. This is more so a case when there are multiple objects in one single image.

2) Further to this, a better feature detector, might also lead to higher accuracy and results.

3) Playing around with weight initializings, learning rates, drop-outs, batch_sizes, and optimizers will be helpful to improve performances.

In [27]:
## TODO: Execute your algorithm from Step 6 on
## at least 6 images on your computer.
## Feel free to use as many code cells as needed.

## suggested code, below
for file in np.hstack((human_files[:3], dog_files[:3])):
    run_app(file)
Hello Human!
You look like a ... Clumber spaniel

-----------------------------------

Hello Human!
You look like a ... Basenji

-----------------------------------

Hello Human!
You look like a ... Cane corso

-----------------------------------

This is a picture of a ... Bullmastiff

-----------------------------------

This is a picture of a ... Bullmastiff

-----------------------------------

This is a picture of a ... Bullmastiff

-----------------------------------

In [28]:
import numpy as np
from glob import glob

# load filenames
files = np.array(glob("/home/workspace/dog_project/images/SJ/*"))
for file_path in files:
    run_app(file_path)
Sorry, I did not detect a human or a dog in this image.

-----------------------------------

This is a picture of a ... Chow chow

-----------------------------------

Hello Human!
You look like a ... Pembroke welsh corgi

-----------------------------------

Sorry, I did not detect a human or a dog in this image.

-----------------------------------

This is a picture of a ... Chesapeake bay retriever

-----------------------------------

This is a picture of a ... Alaskan malamute

-----------------------------------

Sorry, I did not detect a human or a dog in this image.

-----------------------------------

Sorry, I did not detect a human or a dog in this image.

-----------------------------------

Hello Human!
You look like a ... Dogue de bordeaux

-----------------------------------

This is a picture of a ... Nova scotia duck tolling retriever

-----------------------------------

Hello Human!
You look like a ... Cane corso

-----------------------------------

Hello Human!
You look like a ... Basenji

-----------------------------------

This is a picture of a ... Cane corso

-----------------------------------

This is a picture of a ... Golden retriever

-----------------------------------

Sorry, I did not detect a human or a dog in this image.

-----------------------------------

Sorry, I did not detect a human or a dog in this image.

-----------------------------------

This is a picture of a ... French bulldog

-----------------------------------

Hello Human!
You look like a ... Bull terrier

-----------------------------------

Sorry, I did not detect a human or a dog in this image.

-----------------------------------

In [ ]: